Introduction to Distributional Regression

Lecture 1

Gillian Heller

NHMRC Clinical Trials Centre, University of Sydney

gillian.heller@sydney.edu.au

Acknowledgements

Acknowledgement to Country


I wish to acknowledge the Ngunnawal people as traditional custodians of the land we are meeting on and recognise any other people or families with connection to the lands of the ACT and region.


I wish to acknowledge and respect their continuing culture and the contribution they make to the life of this city and this region.


I would also like to acknowledge and welcome other Aboriginal and Torres Strait Islander people who may be attending today’s event.

Collaborators

GAMLSS Working Party

  • Mikis Stasinopoulos (Univ Greenwich, UK)
  • Bob Rigby (Univ Greenwich, UK)
  • Thomas Kneib (Univ Göttingen, Germany)
  • Nikolaus (Niki) Umlauf (Univ Innsbruck, Austria)
  • Achim Zeileis (Univ Innsbruck, Austria)
  • Reto Stauffer (Univ Innsbruck, Austria)
  • Gillian Heller (University of Sydney)
  • Andreas Mayr (Univ Marburg, Germany)
  • Fernanda de Bastiani (UFP, Brazil)

Materials for today’s short course have been inspired by similar short courses given by some of the above colleagues, and some sections have been “borrowed” from their generously shared materials, and from the books that we have written jointly.

Plan for the day


8:45 - 9am Tea/coffee on arrival
9 - 10am Lecture
10 - 10:30am Morning tea
10:30 - 12pm Lecture & practical
12 - 1pm Lunch
1 - 2:30pm Lecture & practical
2:30 - 3pm Afternoon tea
3 - 4:30pm Lecture & practical
4:30pm Short course end

Contents


Lecture 1 Introduction to distributional regression, GAMLSS
Lecture 2 The GAMLSS model, continuous response distributions
Lecture 3 Selecting response distribution, model terms
Practical 1 Plasma data set: select response distribution and covariates
Lecture 4 Model diagnostics
Practical 2 Plasma data set: diagnostics, parameter interpretation
Lecture 5 Discrete response distributions, APTS study
Practical 3 Mixed distributions; speech intelligibility data set

Introduction to distributional regression and GAMLSS

Linear regression

\begin{align*} y_{i}&\sim\mathcal{N}\left(\mu_i,\sigma^2\right)& \text { independently, for } i=1, \ldots, n \\ \mathbb{E}(y_i)=\mu_i&=\beta_0 + \beta_1 x_{i1}+\ldots+\beta_p x_{ip} \end{align*}


Traditional notation:

\begin{align*} y_i&=\beta_0 + \beta_1 x_{i1}+\ldots+\beta_p x_{ip}+\epsilon_i\\ \epsilon_i&\sim\mathcal{N}\left(0,\sigma^2\right)& \text { independently, for } i=1, \ldots, n \end{align*}

Non-normal regression (“The new normal”)

  • Distribution of y can be non-normal

  • Relationship between y and x can be nonlinear

  • Covariate x can affect mean, spread and shape of the distribution

Non-normal regression (“The new normal”)

Generalized linear model (GLM)

\begin{align*} y_{i}&\sim \mathcal{E}\left(\mu_i,\sigma\right) \\ g(\mu_i)&=\beta_0 + \beta_1 x_{i1}+\ldots+\beta_p x_{ip} \end{align*}

  • \mathcal{E} = exponential family (normal, Bernoulli, Poisson, Gamma, inverse Gaussian)

  • \mu_i=\mathbb{E}(y_i)

  • \sigma = dispersion parameter

  • g(\cdot) = link function (monotonic, differentiable)


Generalized additive model (GAM)

\begin{align*} y_{i}&\sim \mathcal{E}\left(\mu_i,\sigma\right) \\ g(\mu_i)&=\beta_0 + s_1(x_{i1})+\ldots+s_p(x_{ip}) \end{align*}

  • s_j(x_{ij}) = smooth function or parametric form of x_{ij}

Non-normal regression (“The new normal”)

Generalized Additive Models for Location, Scale and Shape (GAMLSS)

\begin{align*} y_{i}&\sim \mathcal{D}\left(\mu_i,\sigma_i,\nu_i,\tau_i\right) \\ g_1(\mu_i)&=\beta_0^{\mu}+s_1^{\mu}(x_{i1})+\ldots+s_p^{\mu}( x_{ip})\\ g_2(\sigma_i)&=\beta_0^{\sigma}+s_1^{\sigma}(x_{i1})+\ldots+s_p^{\sigma}( x_{ip})\\ g_3(\nu_i)&=\beta_0^{\nu}+s_1^{\nu}(x_{i1})+\ldots+s_p^{\nu}( x_{ip})\\ g_4(\tau_i)&=\beta_0^{\tau}+s_1^{\tau}(x_{i1})+\ldots+s_p^{\tau}( x_{ip}) \end{align*}

  • \mathcal{D} is any distribution with computable log-likelihood and first and second derivatives

  • \mathcal{D} can have any number of parameters (4 are shown above)

  • g_k(\cdot) are appropriate link functions

  • s_j^k(x_{ij}) are smooth functions (e.g. splines) or parametric forms

Non-normal regression (“The new normal”)

Linear regression

\begin{align*} y_{i}&\sim\mathcal{N}\left(\mu_i,\sigma^2\right)\\\\ \mathbb{E}(y_i)=\mu_i&=\beta_0 + \beta_1 x_{i} \end{align*}

  • Shaded area is middle 80% of fitted distribution

Generalized linear model

\begin{align*} y_{i}&\sim \text{GA}\left(\mu_i,\sigma\right) \\\\ \log(\mu_i)&=\beta_0 + \beta_1 x_{i} \end{align*}

Generalized additive model

\begin{align*} y_{i}&\sim \text{GA}\left(\mu_i,\sigma\right) \\\\ \log(\mu_i)&=\beta_0 + s(x_{i}) \end{align*}

GAMLSS

\begin{align*} y_{i}&\sim \text{BCTo}\left(\mu_i,\sigma_i, \nu_i, \tau\right) \\\\ \log(\mu_i)&=\beta_0^\mu + s^\mu(x_{i})\\ \log(\sigma_i)&=\beta_0^\sigma + s^\sigma(x_{i})\\ \log(\nu_i)&=\beta_0^\nu + \beta_1^\nu x_{i}\\ \end{align*}

A real example

https://doi.org/10.1186/s12874-020-01021-y

A history of regression modelling

A history of regression modelling

A history of regression modelling

A history of regression modelling

Frameworks for Distributional Modelling

There are different frameworks that enable distributional regression modelling:

  • generalized additive models for location, scale and shape (GAMLSS),

  • quantile and expectile regression,

  • conditional transformation models, and

  • various other forms.

We will focus on GAMLSS-type models.

GAMLSS development

  • 2005 seminal paper by Rigby & Stasinopoulos (JRSS)

  • 2005 gamlss package released on CRAN (Stasinopoulos & Rigby)

  • 2018 bamlss (Nikolaus Umlauf et al)
  • 2023 GAMLSS Working Party formed with the purpose of

    • updating gamlss
    • improving documentation
    • eventually integrating gamlss and bamlss
  • gamlss2 version 0.1-0 is currently on GitHub

  • 2025 (we hope) gamlss2 released on CRAN

GAMLSS articles and citations

GAMLSS resources

GAMLSS software

This document has been written by Nikolaus Umlauf and Thomas Kneib:


R-Packages.pdf

GAMLSS books

2017

2019

2024


Website

https://www.gamlss.com/

Journal articles and books

Hohberg, M., Pütz, P., & Kneib, T. (2020). Treatment effects beyond the mean using distributional regression: Methods and guidance. PloS One, 15(2), e0226514.
Kneib, T. (2013). Beyond mean regression (with discussion and rejoinder). Statistical Modelling, 13(4), 275–303.
Rigby, R.A., & Stasinopoulos, D.M. (2005). Generalized additive models for location, scale and shape (with discussion). Applied Statistics, 54, 507–554.
Rigby, R.A., Stasinopoulos, D.M., Heller, G.Z., & De Bastiani, F. (2019). Distributions for modeling location, scale, and shape: Using GAMLSS in R. Boca Raton: Chapman & Hall/CRC.
Stasinopoulos, D.M., & Rigby, R.A. (2007). Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, 23(7), 1–46.
Stasinopoulos, D.M., Rigby, R.A., Heller, G.Z., Voudouris, V., & De Bastiani, F. (2017). Flexible regression and smoothing: Using GAMLSS in R. Boca Raton: Chapman & Hall/CRC.
Stasinopoulos, M.D., Kneib, T., Klein, N., Mayr, A., & Heller, G.Z. (2024). Generalized additive models for location, scale and shape: A distributional regression approach, with applications (Vol. 56). Cambridge University Press.